🧠 LLM Inference - akapaka · Scour

Understanding LLM Inference Engines: Inside Nano-vLLM (Part 2)

neutree.ai·2d·

Discuss: Hacker News

🤖Machine Learning

Quantization-Aware Distillation

ternarysearch.blogspot.com·22h·

Discuss: Hacker News

🤖Machine Learning

Recursive Deductive Verification: A framework for reducing AI hallucinations

news.ycombinator.com·10h·

Discuss: Hacker News

LLMs Are Prediction Machines

kaelandt.github.io·5h·

Discuss: Hacker News

🤖Machine Learning

AI attention span so good it shouldn’t be legal

stackoverflow.blog·2d

🤖Machine Learning

Sequential Attention: Making AI models leaner and faster without sacrificing accuracy

research.google·4d·

Discuss: Hacker News, r/LocalLLaMA

🤖Machine Learning

ML-LIB: Machine Learning Library Proposed For The Linux Kernel

phoronix.com·2d·

Discuss: Hacker News

🤖Machine Learning

Continual learning and the post monolith AI era

baseten.co·2d·

Discuss: Hacker News

🤖Machine Learning

Human-like Search for Modern Applications

anvitra.ai·21h·

Discuss: Hacker News

🤖Machine Learning

Show HN: Model Training Memory Simulator

czheo.github.io·15h·

Discuss: Hacker News

🤖Machine Learning

The Rise of Local Speech Recognition

oatmealapp.com·6h·

Discuss: Hacker News

🤖Machine Learning

Learning to Reason in 13 Parameters

arxiv.org·3d·

Discuss: Hacker News

🤖Machine Learning

Databricks adds MemAlign to MLflow to cut cost and latency of LLM evaluation

infoworld.com·3d

Why Files Are Not Enough as Memory for AI Agents

medium.com·13h·

Discuss: Hacker News

How I squeezed a BERT sentiment analyzer into 1GB RAM on a $5 VPS

mohammedeabdelaziz.github.io·1d·

Discuss: Hacker News

Hitting 1,000 tokens per second on a single RTX 5090

blog.alpindale.net·1h·

Discuss: Hacker News

NotebookLM: The AI that only learns from you

byandrev.dev·1d·

Discuss: Hacker News

🤖Machine Learning

Building Highly Efficient Inference System for Recommenders Using PyTorch

pytorch.org·2d·

Discuss: Hacker News

🤖Machine Learning

Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning

lesswrong.com·1d·

Discuss: Hacker News

A Neuro Symbolic Architecture For Induced Epistemic Agency and System 2 Reasoning in Quantized Large Language Models

papers.ssrn.com·3d·

Discuss: Hacker News

🤖Machine Learning

Loading more...